We have 3 years of daily sales of Post-its. Those lovable note snippets that hang around your desk.

We will do the following:

  1. Determine which factors influence sales.

  2. Build a model to forecast daily sales.

# Install pacman if needed
if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
# load packages
pacman::p_load(pacman,
  tidyverse, openxlsx, modeltime, parsnip, rsample, timetk, broom, ggthemes)
#Import data
postit <- read.xlsx("~/Documents/GitHub/Forecasting-in-R/Forecasting-in-R/datasets/POSTITDATA.xlsx", skipEmptyRows = TRUE)

#Check results
str(postit)
## 'data.frame':    1096 obs. of  6 variables:
##  $ Month      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Day#       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Day        : num  40544 40545 40546 40547 40548 ...
##  $ Price      : num  7.52 7.52 5.95 6.2 6.1 6.2 6.98 5.95 7.12 6.98 ...
##  $ Display?   : num  1 0 0 0 0 0 1 0 1 0 ...
##  $ actualsales: num  390 344 636 483 486 490 524 620 416 464 ...

We have 6 variables:

  1. Month
  2. Day number (Essentially a row number)
  3. Day (This is our date variable but since we imported from Excel we get the Excel formatted dates)
  4. Pricing for that particular day (7 different prices)
  5. Display (This is a binary variable. 1 indicates that our product was on a display for that day. 0 indicates that it was not on display).
  6. Actual sales
#Create New Formatted Date Column supplying origin argument
postit$new_date <- as.Date(postit$Day, origin = "1899-12-30")


#Check results
str(postit$new_date)
##  Date[1:1096], format: "2011-01-01" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05" ...
#Column renaming
postit <- postit %>% 
  rename(
    day_num = 'Day#',
    display = 'Display?'
  )

#Check results
names(postit)
## [1] "Month"       "day_num"     "Day"         "Price"       "display"    
## [6] "actualsales" "new_date"
#Create list of variables that need transformation
#fac_vars <- c("Month", "display", "Price")
fac_vars <- c("Month", "display")

#Factor Month, Display and Price Variables
postit[,fac_vars] <- lapply(postit[,fac_vars], factor) 

#Check results
str(postit)
## 'data.frame':    1096 obs. of  7 variables:
##  $ Month      : Factor w/ 12 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ day_num    : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Day        : num  40544 40545 40546 40547 40548 ...
##  $ Price      : num  7.52 7.52 5.95 6.2 6.1 6.2 6.98 5.95 7.12 6.98 ...
##  $ display    : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 2 1 ...
##  $ actualsales: num  390 344 636 483 486 490 524 620 416 464 ...
##  $ new_date   : Date, format: "2011-01-01" "2011-01-02" ...
#Drop Excel formatted Day variable and day_num
postit <- postit %>% 
            select(-Day, -day_num)
#From timetk package - visualize sales
postit %>%
  plot_time_series(new_date, actualsales, .interactive = TRUE)

Our plot of sales indicates that there is a positive trend of sales for our post its.

Trend in time series data is a specific pattern which in which observations are either increasing or decreasing over time. Trends can constantly change.

Task 1: What factors influence sales

#Let's do some exploratory data analysis (EDA)
ggplot(data = postit, aes(x=actualsales)) + geom_histogram() + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Sales are a bit skewed to the right

ggplot(data=postit, aes(x=log(actualsales))) + geom_histogram() + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Sales now look more normal after log transformation

ggplot(data = postit, aes(x=Month, y = actualsales)) + geom_boxplot() + theme_minimal()

#Sales are lowest in March. Sales are generally higher in Q4 and the highest in December. December also appears to have the largest spread. Some outliers during months August and September.

ggplot(data = postit, aes(x=Price, y = actualsales)) + geom_boxplot() + theme_minimal()
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

#This is quite interesting, as sales are highest when the price is 5.95. Sales decrease when price increases to 6.1 and so and so forth where the lowest sales happen when the price is at its highest 7.52. There appears to be a relationship between price and sales.

ggplot(data = postit, aes(x=display, y = actualsales)) + geom_boxplot() + theme_minimal()

#More sales when product is on display. 

postit %>%  
  group_by(display) %>% 
  summarize(mean_sales = mean(actualsales), median_sales = median(actualsales), sd_sales = sd(actualsales))
## # A tibble: 2 × 4
##   display mean_sales median_sales sd_sales
##   <fct>        <dbl>        <dbl>    <dbl>
## 1 0             587.          562     184.
## 2 1             639.          619     201.
#Average sales are 639 vs 587 when no display.

Now we have some idea of the factors that might influence sales, let’s check for significance by using a linear regression model.

# Model Spec
model_spec_lm <- linear_reg() %>%
    set_engine('lm') %>% 
    set_mode('regression')
# Fit Linear Regression Model
model_fit_lm <- model_spec_lm %>%
    fit(actualsales ~ Month + Price + display, data = postit)

#Uncomment if want to run model with date as xreg
# model_fit_lm <- model_spec_lm %>%
#     fit(actualsales ~ Month + Price + display + as.numeric(new_date), data = postit)
#Print summary of model in a tidy object
lm_summary <- tidy(model_fit_lm) %>% 
              mutate(significant = p.value <= 0.05)
lm_summary
## # A tibble: 14 × 6
##    term        estimate std.error statistic   p.value significant
##    <chr>          <dbl>     <dbl>     <dbl>     <dbl> <lgl>      
##  1 (Intercept)  1801.       31.2     57.8   0         TRUE       
##  2 Month2          7.17     13.0      0.552 5.81e-  1 FALSE      
##  3 Month3        -95.5      12.7     -7.55  9.56e- 14 TRUE       
##  4 Month4         -5.87     12.8     -0.460 6.46e-  1 FALSE      
##  5 Month5         77.6      12.7      6.13  1.26e-  9 TRUE       
##  6 Month6         78.5      12.8      6.15  1.11e-  9 TRUE       
##  7 Month7         74.0      12.7      5.84  7.03e-  9 TRUE       
##  8 Month8         77.4      12.7      6.10  1.47e-  9 TRUE       
##  9 Month9         86.2      12.8      6.74  2.49e- 11 TRUE       
## 10 Month10       135.       12.7     10.6   3.60e- 25 TRUE       
## 11 Month11       252.       12.8     19.8   1.73e- 74 TRUE       
## 12 Month12       360.       12.7     28.4   8.33e-133 TRUE       
## 13 Price        -196.        4.46   -43.9   2.05e-242 TRUE       
## 14 display1       63.6       6.53     9.75  1.43e- 21 TRUE
#Use the glance function from the broom package to get additional information from the model (e.g. model statitics like r.squared)
(mod_glance <- glance(model_fit_lm))
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic p.value    df logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1     0.793         0.791  86.3      319.       0    13 -6434. 12899. 12974.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
#If you prefer, you can use summary function to print output to console
summary(model_fit_lm$fit)
## 
## Call:
## stats::lm(formula = actualsales ~ Month + Price + display, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -421.68  -50.85    5.26   52.26  397.36 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1801.392     31.151  57.829  < 2e-16 ***
## Month2         7.174     13.006   0.552    0.581    
## Month3       -95.543     12.663  -7.545 9.56e-14 ***
## Month4        -5.871     12.772  -0.460    0.646    
## Month5        77.631     12.671   6.127 1.26e-09 ***
## Month6        78.488     12.770   6.146 1.11e-09 ***
## Month7        74.032     12.684   5.837 7.03e-09 ***
## Month8        77.357     12.679   6.101 1.47e-09 ***
## Month9        86.232     12.785   6.745 2.49e-11 ***
## Month10      134.656     12.668  10.630  < 2e-16 ***
## Month11      252.429     12.771  19.766  < 2e-16 ***
## Month12      359.843     12.685  28.369  < 2e-16 ***
## Price       -195.501      4.456 -43.877  < 2e-16 ***
## display1      63.640      6.530   9.746  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 86.33 on 1082 degrees of freedom
## Multiple R-squared:  0.793,  Adjusted R-squared:  0.7905 
## F-statistic: 318.8 on 13 and 1082 DF,  p-value: < 2.2e-16

A generic interpretation of a linear regression model. For every one unit change in coefficient (term column), moves unit sales by the value of our estimate while all other variables remain at the same level.

Most months are significant but December is positive and the most statistically significant with the highest average sales. March is the month with the lowest average sales. Time of year does impact sales.

All price coefficients are negative and significant which means that all price levels above 5.95 reduces sales and continues to reduce at each price increase. Customers seem to be price sensitive when it comes to post-its. As suggested by our EDA, there is a negative relationship between price and sales.

For display advertising, we can expect an increase of 60 when there is a display.

-Evaluate model fit

As for the model statistics, we have a very low p-value and a pretty decent r.square of .885, which tells us that that 89% of the variation in sales is explained by our independent/explanatory variables.

We built this linear model for mostly gaining insights into what factors influence sales.

Task 2: Build a forecast model for sales

We will split our dataset into test and training data. We will use the last 12 months of the dataset as the training set.

#Step 1: Split data into test and training set
splits <- postit %>%
  time_series_split(date_var = new_date, assess = "12 months", cumulative = TRUE)

#Visualize test train split
splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(new_date, actualsales, .interactive = TRUE)
# Step 2: Model Spec and Model Fit
model_fit_prophet  <- prophet_reg() %>%
    set_engine('prophet') %>% 
  fit(actualsales ~ ., data = training(splits))
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_fit_prophet
## parsnip model object
## 
## Fit time:  1.8s 
## PROPHET w/ Regressors Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 13
#Step 3: Put model into a modeltime table
models_tbl <- modeltime_table(model_fit_prophet)

models_tbl
## # Modeltime Table
## # A tibble: 1 × 3
##   .model_id .model   .model_desc          
##       <int> <list>   <chr>                
## 1         1 <fit[+]> PROPHET W/ REGRESSORS
#Step 4: Calibrate model
calibration_tbl <- models_tbl %>% 
  modeltime_calibrate(new_data = testing(splits))

#Can also print calibration model to console
calibration_tbl
## # Modeltime Table
## # A tibble: 1 × 5
##   .model_id .model   .model_desc           .type .calibration_data 
##       <int> <list>   <chr>                 <chr> <list>            
## 1         1 <fit[+]> PROPHET W/ REGRESSORS Test  <tibble [365 × 4]>
#Step 5: Get Accuracy Metrics
calibration_tbl %>%
    modeltime_accuracy()
## # A tibble: 1 × 9
##   .model_id .model_desc           .type   mae  mape  mase smape  rmse   rsq
##       <int> <chr>                 <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1         1 PROPHET W/ REGRESSORS Test   73.8  12.8 0.501  11.7  94.8 0.842
#Plot the residuals
# Out-of-Sample data (test set)
#Change new_data argument if you want to plot in-sample residuals (training set). Timeplot is the default but can change to acf or seasonality plot.
calibration_tbl %>%
    modeltime_calibrate(new_data = testing(splits)) %>%
    modeltime_residuals() %>%
    plot_modeltime_residuals(.interactive = TRUE, .type = "timeplot", .legend_show = FALSE)
#Statistical tests
calibration_tbl %>%
    modeltime_residuals_test(new_data = testing(splits))
## # A tibble: 1 × 6
##   .model_id .model_desc          shapiro_wilk box_pierce ljung_box durbin_watson
##       <int> <chr>                       <dbl>      <dbl>     <dbl>         <dbl>
## 1         1 PROPHET W/ REGRESSO…     3.96e-10  0.0000793 0.0000741          1.09

modeltime_accuracy() function gives all accuracy metrics.

I like to pay attention to the the mean absolute percentage error (MAPE) which is 6 so our forecasts are off around 6%.

We want our time series residuals to hover around zero. Everything seems okay until November and December as we start to see more variability.

#Step 6: Create future forecast on test data
(forecast_tbl <- calibration_tbl %>%
    modeltime_forecast(
        new_data    = testing(splits),
        actual_data = postit,
        keep_data = TRUE #Includes the new data (and actual data) as extra columns with the results of the model forecasts
    ))
## # A tibble: 1,461 × 12
##    .model_id .model_desc .key   .index     .value .conf_lo .conf_hi Month Price
##        <int> <chr>       <fct>  <date>      <dbl>    <dbl>    <dbl> <fct> <dbl>
##  1        NA ACTUAL      actual 2011-01-01    390       NA       NA 1      7.52
##  2        NA ACTUAL      actual 2011-01-02    344       NA       NA 1      7.52
##  3        NA ACTUAL      actual 2011-01-03    636       NA       NA 1      5.95
##  4        NA ACTUAL      actual 2011-01-04    483       NA       NA 1      6.2 
##  5        NA ACTUAL      actual 2011-01-05    486       NA       NA 1      6.1 
##  6        NA ACTUAL      actual 2011-01-06    490       NA       NA 1      6.2 
##  7        NA ACTUAL      actual 2011-01-07    524       NA       NA 1      6.98
##  8        NA ACTUAL      actual 2011-01-08    620       NA       NA 1      5.95
##  9        NA ACTUAL      actual 2011-01-09    416       NA       NA 1      7.12
## 10        NA ACTUAL      actual 2011-01-10    464       NA       NA 1      6.98
## # … with 1,451 more rows, and 3 more variables: display <fct>,
## #   actualsales <dbl>, new_date <date>
#Step 7: Plot modeltime forecast - this is the test data
plot_modeltime_forecast(forecast_tbl)

Forecast sales into the future

#Create a tibble of observations with length out being the number of observations we want in reference to our date variable (new_data). Since new_date ends on Dec 31st, our future_frame starts on Jan 1st counts what we put into our length_out argument into the future.
#Create tibble of dates using future_frame() from timetk package
dates <- postit %>% 
  future_frame(new_date, .length_out = "1 year")

#Simulate display data
display <- rep(0:1, each = 2, length.out = 365)

#Simulate Price data
Price <- rep(c(5.95, 6.1, 6.2, 6.98, 7.12, 7.32, 7.52), length.out = 365)

#Put data into dataframe
explanatory_data <- cbind(dates, display, Price)

#Add additional Month and day_num variables
explanatory_data <- explanatory_data %>% 
  mutate(Month = format(new_date, "%m"),
         day_num = format(new_date, "%d"))

#Need to factor variables 
#fac_vars <- c("Month", "display", "Price")
fac_vars <- c("Month", "display")

#Factor Month, Display and Price Variables
explanatory_data[,fac_vars] <- lapply(explanatory_data[,fac_vars], factor) 

#Change day_num from character to numeric vector and strip leading zeros
library(stringr)
explanatory_data$day_num <- as.numeric(str_remove(explanatory_data$day_num, "^0+")) 

#Do the same for the Month variable
explanatory_data$Month <- as.factor(str_remove(explanatory_data$Month, "^0+"))

# explanatory_data$day_num <- as.numeric(explanatory_data$day_num)

#Check results
str(explanatory_data)
## 'data.frame':    365 obs. of  5 variables:
##  $ new_date: Date, format: "2014-01-01" "2014-01-02" ...
##  $ display : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 1 ...
##  $ Price   : num  5.95 6.1 6.2 6.98 7.12 7.32 7.52 5.95 6.1 6.2 ...
##  $ Month   : Factor w/ 12 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ day_num : num  1 2 3 4 5 6 7 8 9 10 ...
#Create a tibble of observations with length out being the number of observations we want in reference to our date variable (new_data). Since new_date ends on Dec 31st, our future_frame starts on Jan 1st counts what we put into our length_out argument into the future.

# #Create tibble of dates using future_frame() from timetk package
# dates <- postit %>%
#   future_frame(new_date, .length_out = "1 year")
# 
# #Simulate Price data
# #We will keep price constant at 5.95
# Price <- rep(5.95, length.out = 365)
# 
# #Put dates and Price into data frame
# explanatory_data <- cbind(dates, Price)
# 
# #Add additional Month and day_num variables
# explanatory_data <- explanatory_data %>%
#   mutate(Month = format(new_date, "%m"),
#          day_num = format(new_date, "%d"))
# 
# #Simulate display data - we want to be purposeful and use display on months where sales are the lowest in order to boost sales - February, March & April
# explanatory_data <- explanatory_data %>%
#   mutate(display = ifelse(Month%in% c('02', '03', '04'), 1, 0))
# 
# #Check results
# table(explanatory_data$display)
# 
# #Need to factor variables
# fac_vars <- c("Month", "display", "Price")
# 
# #Factor Month, Display and Price Variables
# explanatory_data[,fac_vars] <- lapply(explanatory_data[,fac_vars], factor)
# 
# #Change day_num from character to numeric vector and strip leading zeros
# library(stringr)
# explanatory_data$day_num <- as.numeric(str_remove(explanatory_data$day_num, "^0+"))
# 
# #Do the same for the Month variable
# explanatory_data$Month <- as.factor(str_remove(explanatory_data$Month, "^0+"))
# 
# #Reorder columns
# explanatory_data <- explanatory_data %>%
#   select(Month, Price, display, new_date)
# 
# #Check results
# str(explanatory_data)

CAUTION: It is super important that the data in your new data frame matches the exact same formatting of the data in the data used in building the forecast. If errors occur during either the model fit or forecasting phase, differently formatted data may be the culprit.

#First, refit to the full dataset
refit_tbl <- calibration_tbl %>%
  modeltime_refit(data = postit)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#Specify the H or horizon argument to get a forecast into the future, after refitting model to the entire dataset, but if using xregs (independent regressors), you must create a new dataframe with the xregs to be used.

#Forecast on the new tibble dataframe
forecast_tbl_future_data <- refit_tbl %>%
    modeltime_forecast(
        new_data    = explanatory_data
    )

#Check results of forecast
head(forecast_tbl_future_data)
## # A tibble: 6 × 7
##   .model_id .model_desc           .key       .index     .value .conf_lo .conf_hi
##       <int> <chr>                 <fct>      <date>      <dbl>    <dbl>    <dbl>
## 1         1 PROPHET W/ REGRESSORS prediction 2014-01-01   736.     550.     922.
## 2         1 PROPHET W/ REGRESSORS prediction 2014-01-02   687.     502.     873.
## 3         1 PROPHET W/ REGRESSORS prediction 2014-01-03   724.     538.     909.
## 4         1 PROPHET W/ REGRESSORS prediction 2014-01-04   570.     384.     756.
## 5         1 PROPHET W/ REGRESSORS prediction 2014-01-05   462.     276.     648.
## 6         1 PROPHET W/ REGRESSORS prediction 2014-01-06   396.     211.     582.
#Plot and visualize the forecasts
plot_modeltime_forecast(forecast_tbl_future_data, .interactive = TRUE)

Data Visualization of forecasts for business stakeholders

# Install package if needed and load library
if (!require("reactable")) install.packages("reactable")
## Loading required package: reactable
library(reactable)

#Subset the date and prediction columns
final_fc <- forecast_tbl_future_data %>%
  select(.index, .value) %>% 
  mutate(month_year = format(.index, "%m-%Y"))

#Group by month and sum sales
final_fc_gr <- final_fc %>% 
  group_by(month_year) %>% 
  summarize(sales = sum(.value))

#Put forecasts into a nice and pretty table
#I am using the reactable library
fc_table <- final_fc_gr %>% 
  reactable(resizable = TRUE, bordered = TRUE, defaultPageSize = 12, striped = TRUE, columns = list(
sales = colDef(format = colFormat(prefix = "$", separators = TRUE, digits = 2))), theme = reactableTheme(stripedColor = "#e7edf5"))

#Visualize table
fc_table
# Forecasts in a barplot
ggplot(final_fc_gr, aes(x=month_year, y=sales)) + 
  geom_bar(stat = "identity", fill="steelblue") + theme_calc() + ylab(" ") + xlab(" ") + ggtitle("2014 Sales Forecast", subtitle = "Assuming display and pricing structure remains unchanged from prior year") + theme(legend.position=c(0.2,.85), plot.title = element_text(family = "Arial", face = "bold", colour = "darkblue", size = 24, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) 

#Subset last year of postit data
postit_2013 <- postit %>% 
  filter(new_date >= "2013-01-01") %>% 
  mutate(month_year = format(new_date, "%m-%Y"))
  
#Check results
str(postit_2013)
## 'data.frame':    365 obs. of  6 variables:
##  $ Month      : Factor w/ 12 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Price      : num  7.32 7.52 6.1 5.95 7.52 6.2 7.12 7.32 7.12 7.32 ...
##  $ display    : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 2 ...
##  $ actualsales: num  387 396 536 688 400 591 405 448 416 451 ...
##  $ new_date   : Date, format: "2013-01-01" "2013-01-02" ...
##  $ month_year : chr  "01-2013" "01-2013" "01-2013" "01-2013" ...
#Create grouped dataframe - group by month and sum sales
postit_2013_gr <- postit_2013 %>% 
  group_by(month_year) %>% 
  summarize(sales = sum(actualsales))

#Combine data into one dataframe - bind by row
combined_df <- rbind(final_fc_gr, postit_2013_gr)

#Add year variable
combined_df <- combined_df %>% 
  mutate(year_of_sales =
    case_when(
  grepl("2013", month_year) ~ 2013,
  grepl("2014", month_year) ~ 2014)
  )

#Check results
head(combined_df)
## # A tibble: 6 × 3
##   month_year  sales year_of_sales
##   <chr>       <dbl>         <dbl>
## 1 01-2014    18309.          2014
## 2 02-2014    16500.          2014
## 3 03-2014    14804.          2014
## 4 04-2014    17394.          2014
## 5 05-2014    20675.          2014
## 6 06-2014    19707.          2014
#Bar plot using geom_col
ggplot(combined_df, aes(x=sales, y = month_year, fill = factor(year_of_sales))) +
  geom_col(position = "dodge")

# Does the same as above using geom_bar
ggplot(combined_df, aes(x=sales, y = factor(month_year), fill = factor(year_of_sales))) +
  geom_bar(stat="identity", position=position_dodge()) + theme_minimal()

#Prep 2013 sales data by renaming sales column
postit_2013_gr_01 <- rename(postit_2013_gr, sales_2013 = sales, month_year_2013 = month_year)

#Prep forecast data by renaming sales column
final_fc_gr_01 <- rename(final_fc_gr, sales_2014_forecast = sales, month_year_2014 = month_year)

#Combine data into one dataframe - bind by column
combined_df_col <- cbind(postit_2013_gr_01, final_fc_gr_01)

#Extract month column
combined_df_col <- combined_df_col %>% 
  mutate(month  =
    case_when(
  grepl("2013", month_year_2014) ~ 2013,
  grepl("2014", month_year_2014) ~ 2014)
  )

#Extract a month column
combined_df_col <- combined_df_col %>% 
  mutate(month_of_yr  = month.name[as.numeric(substr(c(month_year_2013),1,2))],
         month  = as.numeric(substr(c(month_year_2013),1,2)))

#Check results
combined_df_col
##    month_year_2013 sales_2013 month_year_2014 sales_2014_forecast month
## 1          01-2013      15507         01-2014            18309.46     1
## 2          02-2013      15195         02-2014            16499.97     2
## 3          03-2013      13025         03-2014            14803.61     3
## 4          04-2013      16399         04-2014            17394.38     4
## 5          05-2013      20212         05-2014            20675.00     5
## 6          06-2013      18677         06-2014            19706.71     6
## 7          07-2013      18505         07-2014            20611.06     7
## 8          08-2013      20222         08-2014            20552.60     8
## 9          09-2013      16816         09-2014            19914.00     9
## 10         10-2013      21594         10-2014            22812.22    10
## 11         11-2013      23761         11-2014            25037.96    11
## 12         12-2013      28799         12-2014            29399.21    12
##    month_of_yr
## 1      January
## 2     February
## 3        March
## 4        April
## 5          May
## 6         June
## 7         July
## 8       August
## 9    September
## 10     October
## 11    November
## 12    December
#Plot the results
ggplot(combined_df_col, aes(x=month)) + 
  geom_line(aes(x=month, y = sales_2013, color = "sales_2013"), size=1) + 
  geom_line(aes(x=month, y = sales_2014_forecast, color="sales_2014_forecast"), size=1, linetype="twodash")+ 
  theme_calc() + 
  ylab("  ") + 
  ggtitle("Comparison of current annual sales \n with next year's forecasted sales", subtitle = "Assuming the same pricing and display promotions of the current year") +
  scale_x_continuous(breaks = seq(from = 0, to = 12, by = 1)) +   
  scale_color_manual(name = " ", values = c("sales_2013" = "darkred", "sales_2014_forecast" = "steelblue")) +
  theme(legend.position=c(0.2,.85), plot.title = element_text(family = "Arial", face = "bold", colour = "darkblue", size = 24, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) 

FINAL THOUGHTS:

I really enjoyed performing time series analysis using Modeltime package. As a Tidyverse user, this way of working with machine learning models is streamlined, functional and flexible plus the added benefit of being able to compare many models at once.